'Size of original data set (cols, rows)'
(2666, 18)
Data Preview
| last_purchase | max_discount | shoe_spend | apparell_spend | acc_spend | custserv_calls | churn | acc_purchasers | promo_purchaser | shoe_orders | apparel_orders | acc_orders | gender | ecommShopper | bhShopper | state | area_code | phone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 56.5 | 0.26 | 322.2 | 194.3 | 126 | 1 | 0 | 0 | 1 | 3 | 2 | 4 | Male | False | True | MS | 510 | 402-5509 |
| 1 | 84.0 | 0.46 | 279.1 | 170.9 | 92 | 0 | 0 | 0 | 1 | 2 | 2 | 3 | Male | False | False | OH | 510 | 370-3021 |
| 2 | 96.0 | 0.00 | 294.7 | 306 | 96 | 1 | 1 | 0 | 0 | 2 | 3 | 3 | Female | True | False | MI | 415 | 373-1448 |
| 3 | 62.0 | 0.00 | 255.4 | 185.6 | 100 | 2 | 0 | 0 | 0 | 2 | 2 | 3 | Male | False | False | VT | 510 | 403-1769 |
| 4 | 45.0 | 0.28 | 300.6 | 197.9 | 154 | 0 | 0 | 0 | 1 | 3 | 2 | 5 | Male | False | True | WV | 408 | 405-9384 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2661 | 83.0 | 0.29 | 276.1 | 193.5 | 97 | 2 | 0 | 0 | 1 | 2 | 2 | 3 | Male | True | True | ID | 510 | 399-7029 |
| 2662 | 71.0 | 0.00 | 154.7 | 230.1 | 135 | 5 | 1 | 0 | 0 | 1 | 3 | 4 | Male | False | False | OK | 408 | 345-1524 |
| 2663 | 82.0 | 0.00 | 221.9 | 92.2 | 113 | 3 | 0 | 0 | 0 | 2 | 2 | 4 | Male | False | True | MA | 415 | 419-2767 |
| 2664 | 79.5 | 0.42 | 279.3 | 158.2 | 113 | 0 | 0 | 0 | 1 | 2 | 2 | 4 | Male | False | True | VT | 415 | 403-5552 |
| 2665 | 74.0 | 0.41 | 201.8 | 170.8 | 103 | 5 | 1 | 1 | 1 | 2 | 2 | 3 | Male | False | True | VT | 510 | 378-3508 |
2666 rows × 18 columns
last_purchase float64 max_discount float64 shoe_spend float64 apparell_spend object acc_spend int64 custserv_calls int64 churn int64 acc_purchasers int64 promo_purchaser int64 shoe_orders int64 apparel_orders int64 acc_orders int64 gender object ecommShopper bool bhShopper bool state object area_code int64 phone object dtype: object
pandas-profiling provides fairly comprehensive data exploration - quickly revealing potential missing data, datatype problems and correlations between IVs
drop phone column due to high cardinality
clean incorrect dtypes
fill missing values (since there are very few)
convert categoricals to 1-hot
separate DV from IVs
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy if sys.path[0] == '':
| last_purchase | max_discount | shoe_spend | apparell_spend | acc_spend | custserv_calls | churn | acc_purchasers | promo_purchaser | shoe_orders | apparel_orders | acc_orders | gender | ecommShopper | bhShopper | state | area_code | phone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 707 | 66.0 | 0.0 | 314.8 | 171.388105 | 116 | 0 | 1 | 0 | 0 | 3 | 2 | 4 | Female | False | True | CA | 408 | 329-9067 |
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2666 entries, 0 to 2665 Data columns (total 74 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 last_purchase 2666 non-null float64 1 max_discount 2666 non-null float64 2 shoe_spend 2666 non-null float64 3 apparell_spend 2666 non-null float64 4 acc_spend 2666 non-null int64 5 custserv_calls 2666 non-null int64 6 churn 2666 non-null int64 7 acc_purchasers 2666 non-null int64 8 promo_purchaser 2666 non-null int64 9 shoe_orders 2666 non-null int64 10 apparel_orders 2666 non-null int64 11 acc_orders 2666 non-null int64 12 gender_Female 2666 non-null uint8 13 gender_Male 2666 non-null uint8 14 ecommShopper_False 2666 non-null uint8 15 ecommShopper_True 2666 non-null uint8 16 bhShopper_False 2666 non-null uint8 17 bhShopper_True 2666 non-null uint8 18 state_AD 2666 non-null uint8 19 state_AK 2666 non-null uint8 20 state_AL 2666 non-null uint8 21 state_AR 2666 non-null uint8 22 state_ARZ 2666 non-null uint8 23 state_AZ 2666 non-null uint8 24 state_CA 2666 non-null uint8 25 state_CO 2666 non-null uint8 26 state_CT 2666 non-null uint8 27 state_DC 2666 non-null uint8 28 state_DE 2666 non-null uint8 29 state_FL 2666 non-null uint8 30 state_GA 2666 non-null uint8 31 state_HI 2666 non-null uint8 32 state_IA 2666 non-null uint8 33 state_ID 2666 non-null uint8 34 state_IL 2666 non-null uint8 35 state_IN 2666 non-null uint8 36 state_KS 2666 non-null uint8 37 state_KY 2666 non-null uint8 38 state_LA 2666 non-null uint8 39 state_MA 2666 non-null uint8 40 state_MD 2666 non-null uint8 41 state_ME 2666 non-null uint8 42 state_MI 2666 non-null uint8 43 state_MN 2666 non-null uint8 44 state_MO 2666 non-null uint8 45 state_MS 2666 non-null uint8 46 state_MT 2666 non-null uint8 47 state_NC 2666 non-null uint8 48 state_ND 2666 non-null uint8 49 state_NE 2666 non-null uint8 50 state_NH 2666 non-null uint8 51 state_NJ 2666 non-null uint8 52 state_NM 2666 non-null uint8 53 state_NV 2666 non-null uint8 54 state_NY 2666 non-null uint8 55 state_OH 2666 non-null uint8 56 state_OK 2666 non-null uint8 57 state_OR 2666 non-null uint8 58 state_PA 2666 non-null uint8 59 state_RI 2666 non-null uint8 60 state_SC 2666 non-null uint8 61 state_SD 2666 non-null uint8 62 state_TN 2666 non-null uint8 63 state_TX 2666 non-null uint8 64 state_UT 2666 non-null uint8 65 state_VA 2666 non-null uint8 66 state_VT 2666 non-null uint8 67 state_WA 2666 non-null uint8 68 state_WI 2666 non-null uint8 69 state_WV 2666 non-null uint8 70 state_WY 2666 non-null uint8 71 area_code_408 2666 non-null uint8 72 area_code_415 2666 non-null uint8 73 area_code_510 2666 non-null uint8 dtypes: float64(4), int64(8), uint8(62) memory usage: 411.5 KB
auto-sklearn (based on autoML) tests combinations of several model types, preprocessing & hyperparameter search. When compute hours are cheaper than man-hours - this is the place to start. If a simple model fits, don't bother spending weeks on a neural net.
Accuracy score 0.9685157421289355 Balanced Accuracy score 0.9095913801224682
auto-sklearn results: Dataset name: 0dd0a830-77d9-11eb-82f2-0242ac110002 Metric: f1 Best validation score: 0.748899 Number of target algorithm runs: 109 Number of successful target algorithm runs: 98 Number of crashed target algorithm runs: 0 Number of target algorithms that exceeded the time limit: 10 Number of target algorithms that exceeded the memory limit: 1
"[(0.260000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'none', 'feature_preprocessor:__choice__': 'no_preprocessing', 'classifier:random_forest:bootstrap': 'False', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.9282660347661039, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 15, 'classifier:random_forest:min_samples_split': 19, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.01235670698409006},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.120000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 0.11793867099324692, 'classifier:adaboost:max_depth': 9, 'classifier:adaboost:n_estimators': 168, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.8033143308258008, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 6, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 10, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.100000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'most_frequent', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'minmax', 'feature_preprocessor:__choice__': 'feature_agglomeration', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 0.03734246906377268, 'classifier:adaboost:max_depth': 2, 'classifier:adaboost:n_estimators': 416, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.01340273457911319, 'feature_preprocessor:feature_agglomeration:affinity': 'euclidean', 'feature_preprocessor:feature_agglomeration:linkage': 'ward', 'feature_preprocessor:feature_agglomeration:n_clusters': 22, 'feature_preprocessor:feature_agglomeration:pooling_func': 'mean'},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.100000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'none', 'feature_preprocessor:__choice__': 'polynomial', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'gini', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.2540716596996243, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 8, 'classifier:random_forest:min_samples_split': 3, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:polynomial:degree': 2, 'feature_preprocessor:polynomial:include_bias': 'True', 'feature_preprocessor:polynomial:interaction_only': 'False'},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.080000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'quantile_transformer', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 0.466586302641991, 'classifier:adaboost:max_depth': 4, 'classifier:adaboost:n_estimators': 107, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.1400876273288978, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles': 573, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution': 'normal', 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.4390333141263396, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 18, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 5, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.080000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 0.1482280371992092, 'classifier:adaboost:max_depth': 5, 'classifier:adaboost:n_estimators': 125, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.00822701962768376, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.860327693806392, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 14, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 6, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'quantile_transformer', 'feature_preprocessor:__choice__': 'polynomial', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'gini', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.6556617099079364, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 5, 'classifier:random_forest:min_samples_split': 3, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles': 1096, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution': 'normal', 'feature_preprocessor:polynomial:degree': 2, 'feature_preprocessor:polynomial:include_bias': 'True', 'feature_preprocessor:polynomial:interaction_only': 'True'},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:adaboost:algorithm': 'SAMME.R', 'classifier:adaboost:learning_rate': 0.4149400871441459, 'classifier:adaboost:max_depth': 3, 'classifier:adaboost:n_estimators': 325, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.45156307069746615, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 3, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'robust_scaler', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:random_forest:bootstrap': 'False', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.7073993103058536, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 1, 'classifier:random_forest:min_samples_split': 2, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.009191354223372805, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_max': 0.7994739639717848, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_min': 0.02564011515426391, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.9894287030139614, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 20, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 19, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'robust_scaler', 'feature_preprocessor:__choice__': 'no_preprocessing', 'classifier:random_forest:bootstrap': 'False', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.7323115919225983, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 15, 'classifier:random_forest:min_samples_split': 6, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.011901034843417571, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_max': 0.7818500358383581, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_min': 0.20068746139723115},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'adaboost', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:adaboost:algorithm': 'SAMME', 'classifier:adaboost:learning_rate': 0.10000000000000002, 'classifier:adaboost:max_depth': 9, 'classifier:adaboost:n_estimators': 103, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.014782760023936586, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.9030281967182856, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 13, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 5, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'gradient_boosting', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'none', 'feature_preprocessor:__choice__': 'feature_agglomeration', 'classifier:gradient_boosting:early_stop': 'valid', 'classifier:gradient_boosting:l2_regularization': 1.7108930238344161e-10, 'classifier:gradient_boosting:learning_rate': 0.010827728124541558, 'classifier:gradient_boosting:loss': 'auto', 'classifier:gradient_boosting:max_bins': 255, 'classifier:gradient_boosting:max_depth': 'None', 'classifier:gradient_boosting:max_leaf_nodes': 25, 'classifier:gradient_boosting:min_samples_leaf': 4, 'classifier:gradient_boosting:scoring': 'loss', 'classifier:gradient_boosting:tol': 1e-07, 'feature_preprocessor:feature_agglomeration:affinity': 'euclidean', 'feature_preprocessor:feature_agglomeration:linkage': 'ward', 'feature_preprocessor:feature_agglomeration:n_clusters': 164, 'feature_preprocessor:feature_agglomeration:pooling_func': 'mean', 'classifier:gradient_boosting:n_iter_no_change': 19, 'classifier:gradient_boosting:validation_fraction': 0.1759114608225653},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'gradient_boosting', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'no_coalescense', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'quantile_transformer', 'feature_preprocessor:__choice__': 'no_preprocessing', 'classifier:gradient_boosting:early_stop': 'valid', 'classifier:gradient_boosting:l2_regularization': 0.001747314652832166, 'classifier:gradient_boosting:learning_rate': 0.011208309433304312, 'classifier:gradient_boosting:loss': 'auto', 'classifier:gradient_boosting:max_bins': 255, 'classifier:gradient_boosting:max_depth': 'None', 'classifier:gradient_boosting:max_leaf_nodes': 12, 'classifier:gradient_boosting:min_samples_leaf': 47, 'classifier:gradient_boosting:scoring': 'loss', 'classifier:gradient_boosting:tol': 1e-07, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles': 387, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution': 'uniform', 'classifier:gradient_boosting:n_iter_no_change': 6, 'classifier:gradient_boosting:validation_fraction': 0.3173983289937482},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'most_frequent', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'robust_scaler', 'feature_preprocessor:__choice__': 'no_preprocessing', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.6779841015398226, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 14, 'classifier:random_forest:min_samples_split': 14, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.03961232028373377, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_max': 0.9772109830437746, 'data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_min': 0.13300503334706695},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'qda', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'most_frequent', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'none', 'feature_preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:qda:reg_param': 0.5778490489524517, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.06885644779757376, 'feature_preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'feature_preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'feature_preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:max_features': 0.8877058835399982, 'feature_preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'feature_preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 6, 'feature_preprocessor:extra_trees_preproc_for_classification:min_samples_split': 6, 'feature_preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'feature_preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},\ndataset_properties={\n 'task': 1,\n 'sparse': False,\n 'multilabel': False,\n 'multiclass': False,\n 'target_type': 'classification',\n 'signed': False})),\n]"
truncated because the full text is extremely long
"{'mean_test_score': array([0.67317073, 0.65346535, 0. , 0.58358663, 0.29288703,\n 0. , 0.65822785, 0.31034483, 0. , 0. ,\n 0.41628959, 0.58591549, 0.68444444, 0.61403509, 0.6433121 ,\n 0.62650602, 0.69955157, 0.68544601, 0.49342105, 0.5 ,\n 0.51882845, 0. , 0.64761905, 0.68613139, 0.38190955,\n 0.71615721, 0.69724771, 0. , 0.464 , 0.72340426,\n 0.46027397, 0.52991453, 0.51737452, 0.52671756, 0.35064935,\n 0.28248588, 0.71372549, 0.7037037 , 0.4516129 , 0.28909953,\n 0.25914936, 0.24203822, 0.63316583, 0.60377358, 0.7027027 ,\n 0.736 , 0. , 0.68181818, 0.68275862, 0.29106628,\n 0.55457227, 0.66346154, 0.45086705, 0.54347826, 0.33014354,\n 0.39285714, 0.58646617, 0.07194245, 0.34615385, 0.3255814 ,\n 0.68273092, 0.66141732, 0.72881356, 0.44692737, 0.67811159,\n 0.58095238, 0.54375 , 0.29181495, 0. , 0.70642202,\n 0. , 0.47586207, 0. ,"
array([0, 0, 1, ..., 0, 0, 1])
Accuracy score 0.9609902475618904 Balanced Accuracy score 0.8942064821461806
Final F1 Score 0.8594594594594596
As usual, random forest is a solid model. The "best" model favors type I errors (FP - incorrectly predicting someone will leave when that person didn't). This is likely the preferred error (within reasonable levels) because a lost customer is likely more expensive than "playing it safe" on advertising to too many potential "defectors"
If I were to spend more time, I would investigate class balancing further since the DV is imbalanced.
The highly correlated IVs (next couple of cells) should be considered for removal (review source of variance - possibly through PCA - before deciding)
max_discount is highly correlated with promo_purchaser
acc_spend is highly correlated with acc_orders